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(54) An architecture for sett-developing devices 

(57) a " i g device (1) capable of open- 
endeddevelopment makes use of aspecial motivational 
system for selecting which action should be taken on 
the environment by an associated sensory-motor appa- 
ratus (2). For a given candidate action, a motivational 
module (11 ) calculates a reward associated with the cor- 
responding values that would be taken by one or more 
motivational variables that are independent of the na- 
ture of the associated sensory-motor apparatus. Pre- 
ferred motivational variables are dependent on the de- 



velopmental history oft levice ; 
abies quantifying the prod ■ tyand stabil- 

ity of sensory-motor variables serving as the Inputs to 
the device (1). The sensory-motor variables represent 
the status of the external environment and/or the inter- 
nal resources (3) of ;be sensory-motor apparatus (2) 
whose behaviour is controlled by the self-developing de- 
vice (1). Open-ended development is enabled by attrib- 
uting a reward which is proportional to the rate of change 
of the history-dependent motivational variables. 
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Description 

[0001] The present invention relates to an architecture for seif-dsvelc n ■ 1 ■ he invention 

relates ic - adapted so as to be capable of conttnic * < c &< hew (this s 

5 sometimes referred to as the capacity to engage in "lifelong teaming"). 

[0002] The present invention typic ^ »- or in sensory-motor devices such as robotic devices. 

[0003] it is to be understood that, in the present document, when the expression "sensory-motor* is used the word 
f tot lees lotnec 'men. The word "motor" is used, in opposition to the word "sensory", so 

as to designate the effect a device or agent has on rts environment rather man the perception thai agent has of its 

io environment. For a robotic device the term "motor" may indeed cfc - i by the device, 

e.g.ct i otils t of i angle of >ini i< H ver, for an autonomous agent Implement* 

in software, the term "motor" can designate signals which the agenl causes to be output so as to affect its environment. 
[0004] "■ . toiics is to butict devices car- 1 i 19 Oi:eof he chalw 

res< c b 1 this irea is t ictpies for r„uots sc ti it hey we capa d e iscry 1 0 

is competences. These robots usually start with crude capabilities for perception and action, and try to bootstrap new 
know-how based on their "experience" Several researchers have investigated hew some particular competence can 
emerge using a bottom-up mechanism - see, for example. "Learning and communication in imitation, an autonomous 
robot perspective" by P. Andry at at, IEEE Transactions on Systems, Man and Cybernetics. Part A Systems and Hu- 
mans, 31 {5}:431 -444, Sept. 2001 ; "Better vision through manipulation" by G. Metta and P. FUzpatrick. from "Proceed- 

20 ings of the second intsmation 3 rq cognitive development in robotic systems", 

p.97-104. ed. C. Prince et at, 2002 ; "A developmental approach accelerates teaming of joint attention" by Y. Nagai et 
al, Proceedings of the second International conference of development and learning, 2002; and "Articulations of sen- 
sory-motor experiences by forwarding forward mode!" by J. Tani in 'From animals to animate 7" pub. MIT Press. 
Cambridge Ma., USA, 2002. 

i'S [0005] -it tonally, 

notivatio ia! systems « c te-e 1 sible eg 1 sts in defi «ng u elf teio < adapted to 

the behavior that the robot has to develop. When the agent performs the desired task it receives feedback (a reward), 
typically from the environment or from an external user. Several state-of-the art techniques in machine teaming show 
how a robot can learn to behave in order to maximise such a reward function see. for example. "Reinforcement 

30 teaming: A survey" by L. p. Kaelbling et al. Journal of Artificial Intelligence Research. 4. 1 996. 

[Q006J Fig. Ka; illustrates tn schematic form the architecture ot a conventional behaviour-based agent adapted to 
behave so as to maximize a reward function, As shown in Fig. 1 (a), the architecture of the conventional agent can be 
represented in terms of the interaction ot three processes: a "situation awareness" process, a "motivation" process, 
and a "actuation" process. 

35 [0007] The 'situation awareness" process corresponds to the component's and 'unctions within the agent which serve 
to determine or characterize what is amenity happening, ana serve to "understand" it or put it into context. This process 
'> i/i>i] (hn hs he status of the external environment (perceived via Ihe agent's 
sensors), what is the status of the internal environment (that is, the agent's Internal systems and/or resources), and 
what ;s the current behaviour being exhibited by the agent (for example, what are the positions of the agent's limbs, 

to the attitude of its head, otc.F 

[0008] i com - 

pare the current situatic his e r \ ierstand" the current situation and/or to put it into context. For 

example, the agent may be able to decide whether the current situation has happened before and. if so. with what 
frequency, or to attribute a label loth© current situation (e.g. "I am under a tree"). This process may be able to anticipate, 

45 based on past experience, what win happen in the near future, both at me sensory and motor level, in a genera! manner 
tlx siiua - - irocess s awan f lh ns ry-m« rtr t ten, hat t eg 3 ng According 

io some proposals e agon x , < so 01 mode! car 1 e hvn 1 i , ,1 ; I nding ipon the agent's 
experience. 

[0009] "k ov, iru i| sss orresper Js 0 h t u po rt nts it dfunt tie 1 > ent which decide which 

5f action ye agent si ..|i.'inr\ id thet implement that aetier In general, th a > ess will decide 

based on dat ton n tion should be performed 

in order to obtain the greatest reward. 

[0010] he "motiv 1 „ at es the desirability of a given sensory-motor situation . A situation is desirable 

if it results in significant rewards. Conventionally, the "motivation" process evaluates the desirability ot a situation that 
«s maybe.' - » eu t ot anion performed oy f he agent. Thus, the output from the "motivation" process plays a 
role in the selection of action to be performed by the agent. 

[0011] In some knew - svera! internal "motivational v rise ss ate r > s sd with r w; 3 

functions, Th s the "reward" as eva - rd functions. 
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Typically, motivational variables are calculated horn the values of internal and external variables which represent, 
respectively, the status of the agent's interna! systems and the status of the agent's sensor inputs. 
[0012] ;on ana t ' % otivational vanables he associated row J fui ctior at specif he task 
the robot has to learn . tt means that for each new behavior to be developed tr-ooc 

variables and reward functions. For example, if ft important for an agent to ma vta \ < hydration then 

it would have a sensor detecting the current tovei of hydration {the output of which constitutes a sensory-motor variable 
I of hyc o r r -t n t \bU repress ntir g motK J ti n f « th sf . Wf 1 ■> 

hydrat i"v able reaches a value at or near the low one of th© permitted a t 
<m I if II take ge value This motive - pecif ic to the task of maintaining correct hydration 

ai i ea o s J for any other purpose 

[0013] Moreover, the aim t sward functions i ually ensure thai the age i avi « which will keep 

input sensorimotor variables within a predefined range. For f»ani:ils : the sensory-motor variable "level of hydsahon" 
r i i orrect operation of the agent. A . J function 

go rewatd J whi i he value of the mot 

vectorial variable 'Thirst". 

[0014] The preferred embodiments of the present invention provide a new kind of system, which can be termed "a 
self-developing device*, that can develop new competences from scratch, driven only by internal motivations, The 
motivational principles used by the device arc independent of any particular task. Asa consequence, they can constitute 
the bas:s lot a general approach to development of sensoty-motoi competences. 

[001 5] The preferred embodiments of the invention make use of a motivational system in which internal motivational 
variables are hlsiory-dependenh that is, the value of the motivational variable depends upon the developmental history 
of the self-developing device (either upon the values taken by associated sensory-motor variables at different times, 
or upon Uie evolution of the internal parameters of a device or devices cooperating in the compulation of the motivational 
variable), 

[0016] The preferred embodiments, of the invention also provide a now Nino of device in waled behaviour can bo 
selected based on rewards which ate proportional to the rate of change (i e. the derivative) of the value of an internal 
motivational variable, not just minimizing or maximizing the motivational variable's value. 
[0017] The present invention provides a self-developing device comprising: 

input means for determining the value of a set of one or more sensory-motor variables representative of the status 
of the environment; 

control means for ouiputting a set of one or more control signals adapted to control action of a sensory-motor 
apparatus with which the self-developing device is associated in use: 

a motivation modaie for calculating a reward associated with a candidate value in at can be tain en by said sef of 
control signals; and 

selection me, , ,jy the moti- 

vation module which value should be taken by said set of control signals, the selection means controlling the 
control means to output the selected value: 

wherein the motivation module Is adapted to evaluate rewatd I k 1 i, r J t least one motivational 

variable wl 1 £ - ■ oi ry motor variables 

characterized in that th© motivation module uses a computation device adapted to perform a history-dependent 
calculation to caicufat t one motivational vari t , celadon being 

dependent upon at leasi one of: 

a) one or more time-varying internal parameters of the computation device or of a device cooperating with the 
u"i«i la' m 3i vi i r th < >n , nk i i tie, -fin in iiwinielvt ad, a., 

b) values taken at di'ferem times by said at leasi one sensory-motor variable of said set. 

[0018] Because the self-developing device of the present invention changes its bertavsor autonomously driven by 
pendent of a particulai lasts, the same "engine" can be applied to a variety of 
sensory motor development problems and the same device can engage in a process of life -long learning 
[0019] Moreover, the motivational variables applied by the motivation rooau o ye hist i Mm, that 

is,t ' "ii i ledepen upon the evolution over time of an underlying sensory-motor variable 

or the evolution overtime of the internal parameters of the devieejs) involved in com ap - oi monai variable. 
[0020] By making use of histcn - e notivattomt variables, the reward available to the self-developing device 
whei seiectiro t afar es overtime as a result of the history of the device. Thus, the behaviour 

that is necessary In order to obtain a reward evolves dependent upon the development, or experience, of the self- 
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developing device. 

[0021] Preferably, the motivation module calculates a reward of increased value when there is a large change in a 
history-dependent motivationai vanable. This drives the- self-developing device towards open-ended development, 
3 > ng it to extend is 'awareness' of Us environment as ft develops new sensory-motor competences The seif- 
c n t ent (and any actuators associated v> t , note! explo- 

ration i: is by causing action on its environment that the self-developing device recognizes situations. As this sensory- 
motor exploration continues in an open ended manner, the "awareness" of the device keeps increasing. 
[0022] 

motor apparatus wish vvhicn it is associated and (b) the environment in w rich the ia> ce is placed Two id*.c 
devices of th:$ type win engage m developmental pathways that tone 10 be similar because o? ia) dot different because 
of (ta). As each so f lev ng « follows a unique davefopmeniai path ft can be c. ad to be unique. In 
several applications, this uniqueness resulting from the history of the device is what makes :t valuable. 
[0023] Ex ( , i 9 in be used in the present in- 

- ve ind ' hepia it ; ity and stability of s ' ' i inables that are lit tc 

the self-developing device 

[0024] According to the present invention, the moiivalion nodule I I » xiiatedwstr 

a candidate set or motor control signals based on a single reward function associated with a single type of history- 
onai van etc ( or c* mp i ca ird h i tioi which takes a high value when the predictability of 
the system increases). However, it is possible to fake into account two or more reward functions, using a weighted 
si i o late an overa j d wit candidate set r j > nt in i 

[0025] in certain preferred embodiments of he invention these! leveloping de nclude ' i ^ on module 
which is capable of recurrently predicting a series of future values of the sensory-motor variables and motivational 
variables . Trio control means generates a group of differ en; candidate sots of control signals, arid the motivation module 
calculates an expected reward for each candidate set in the group, based on the senes of rewards expected to be 
obtained when the n f I ike! 5 eries of respect iv futni v in i t-on module 

tor that candidate set. The candidate set which produces the greatest expected reward will be selected for output. 
[0026] The self-dev oping device c ie pr emion -s. ii ffect. a behaviour engine that decides which action 

should be taken by an autonomous agent with which the self-developing device Is associated. In use, this behaviour 
engine will typically form part of the control portion of a device (such as a robotic device) which includes sensors for 
determining properties of the external environment and/or the status of the device's interna; resources, and includes 
actuators and i >;st3 implement the seiectod action. 

[0027] According to this aspect of the invention., a robot (or other autonomous agent) can be obtained which is 
capable of open-ended learning. 

[0028] The above and further objects, features and advantages of the present invention will become apparent from 
the following description of preietred embodiments thereof, given by way of example, and illustrated by lire accompa- 
nying drawings, in which: 

Fig.1 shows, in schematic form, how the architecture of a device can be represented in terms of the Interaction of 
three processes, in which: 

Fig.1 A represents the architecture of a conventional behaviour-based agent, and 

Fig, 1 B represents the architecture of a self-developing device according to the present invention; 

Fig.2 is a bl 3 hematk t rntho main components of a sensory- motor apparatus 

1 e i 1 f^v v, <. J }t p f red emt Jim it of 1 pre nt invention 

Fig.3 is a graph showing the evolution of the average of a motivational variable -Predictability". P(t), during a 
first experiment, by simulation, in which a reward amotion generates a reward for increases in tha "Predicta- 
bility" motivational variable, with regard fo a first embodiment cs the invention implementing a simple vision 
system; 

Fig.4isa pat owir e\ utiotic r 1 id pan posit tot variable h^, during the experiment of Fig.3, 
g.5 is a graph showing the ev jt f the a^ 00c t - n Ldml,, Lr In mt> F(t). during a 
second experiment, by Simulation, in which a reward function gene-rales a reward for increases in the Tamii- 
an y' motiw :~bod<nont of the invention; 

Fig 6 ts a grt t 1 i evolution of the "head pan position" variable, h pm , during the experiment of Fig.5; 

atg. 7 Is a graph showing tne evolution of the average of a motivational variable "Stability of head pan position". 
<3p gp (t) inti the evolution of lh< 1 >raga 0 smottva t -,„.<,, i 

1 - • id by simulation, in wtuch a reward function generates a reward for max:mi2ation 

of the stability of the head position . with regard to the first embodiment of the invention; 
Fig.3 is a c if the tead pan position" variable. ft pan> dot , mentof Fjg.7 
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Rg.Sisagra *ng the i l t t-wfaayeu 4 motivational it fit's relative 

position in the pan direction", o^a^l and *e evolution of the average of a motivational variable "Stability of 
he light's rel ' I ,«f rmed by simufaiior 

n which { 

5 with regard to the first embodiment of the invention: 

Fig. 1 0 is a graph snowing the evolution of the "head pan position" variable. h paf) , during the experiment of Fig,9; 
Fig. 11 is a ponograph of a robotic device whose behaviour cm bo conirehed by rnaking use of the motivational 
variables employed :n the simulations ot Figs. 3 to 10; 

Rg.12isa » 1 Hon of six motivational va bles ti Fig 11 during a further 

to experiment; 

Rg. 13 is a graph showing the evolution of ihe head pan position of the device of Fig 11, and the position of a 

perceived light in the experiment of Fig. 12 

Fig. 14 shows, at magnified scale. « detail of Fig. 13: 

Flg.1 5 shews the head pan- tilt trajectory of the device of Fig. 11 during the experiment of Fig 12 
» FigJS is a sequence of 6 Images which illustrates the evolution of the behaviour of a retina according to a 

second embodiment of the present invention, during an experiment; 

Fig.17is g w a motivational variaole ive ig c Jurmg the experiment 

of Fig.16: 

Fsg.18 is a combined graph/image illustrating the trajectory of tire centre of the retina at the beginning of the 
so experiment of Fig. 1:6; 

Flg.1 9 is a combined graph/image illustrating ihe trajectory of ihe centre of the retina at the end of ihe exper- 
iment of Fig.16; 

Fig.20 is a graph showing She percentage: of time steps daring which the centre of the retina was located in 

ihe face region of an image, during the experiment of Fig.16: 
ss Fig.21 is a diagram liustrating 1 1 \ ping levies ac 

cording to the invention gains knowledge of the environment: 
).22tsadi i str8tlngs< matically a second-order coup igact t - toping devices 

according to the invention gain knowledge of the environment and of each other: 

Fig,23 is a diagram illustrating schematically a third-order coupling achieved as two self 
so according to the invention acquire the ability to enter into complex interactions with each other; and 

Fig.24 is a diagram schematically illustrating the way in which s self-developing device according to the present 

invention can progressively develop so as to establish first, second- and third-order couplings, 

£0029] The general architect! re of ass loping dev. ingtott " * t r d ^ ibed 

3S and wil i 1 Id f riam exampl iilus d:i i this archi lis i used fo 

bootstrapping new sensory-motor know-how. 

r C')j'), » vill be assumed that I i 

apparatus of some kind, which acts in relation to the external environment vra one or more actuators. However. :t is to 
be understood that the self-developing device of the invention can also be embodied in software agents and the like 
40 which act on the environment in ways which do not inv physica motion. 

[0031] Fig.2 illustrates the main comf et j of a self ie\ 1 if 1 dm. • .rdingtc prefe I embodiment of 
the invention, which serves as a behaviour engine fora robotic apparatus 2. in other words, the behaviour of the robotic 
apparatus 2 is controlled by the self-developing device 1 . 

£0032} The self- developing device (SDD) 1 is a sensory-motor device which derives information about the environ- 
45 ment interna; to the robotic apparatus 2 {that is : data regarding tne status of the robotic apparatus's infernal resources 
3), via an interface 4. and obtains information about the external environment from sensors, S. via an interface 5. The 
SDD 1 is also aware of ihe states of toe current behaviour of the roboiic apparatus 2 in trie external environment, that 
is s awarr c \ 1 1 s ti t status fvariot fuators A of tta rot mparatus Th l< actt rs 

A 1 t 1 t or 1 m t t 1 t <Ui P 

so [0033] The SDD * hasth cent ? 'Q < motivation moduie 11 and a 

j 1 r i,,i 1 to ipt se Jtai :on device 1 5 \ • 

motivation - FTiecompi Jtion module 15 may cooperate with other components, f example trse prediction 

module 12, in order to calculate the value of one or more motivational variable. The orediction module preferably 
include l'! m U s and il^} which take the current sor so y n in input tnd 

-w try tc 5 uatic h uture sensory € 1 s stats of the motivation 

vector. 

£0034] The architecture and functioning of the SDD 1 wili now be described in greater detail. 
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J h'd i ; and Outputs 

[0035] At any given time t the SDD's perception of the environment (external and internal) can be summarized by a 
vector Sflfjand its action on he arbet mwii /set i /Vfff n t hi. m tu 

5 of the vector S(t) comprise the values of the signals derived from the inter i c >so>5 if e t >roponents 

of the vector M{t) are the values of variables describing the status of the actuators, in genera! M(t) will correspond to 
the control signals sent to the actuators at Ihe time f. The effect cf the signal tod/ • - eti ate r mav be delayed, 

ictuator works. The sensory-motor vector SM(t) sun-' . rfs of information. 

[0036] incidentally, it is not mandatory for S{t) and M(t) i no censors 

is ^ 1 ar'^ ma rioto device 2 with which the SDD 1 is associated, 

[0037] The behavior of the SDD consists in determining what should be its current behaviour M(t) based on the 

current perceived (sensory) situation S{t) and on previous sensory-motor situations SM(t~1), SM(t~2) Given the 

constrain r tent tl « SDD 1 develops, in an unsupervised manner. 

[0038] D i ng tl e onv ronment rt a ma' 1 e which 

« does not require physical motion for example, when the SDD is a software agent which acts on the environment by 
outputting signals of various kinds. 

< ccmc\ e< m<, i,^ t ctj ^ 

20 [0039] in a similar way to a conventional behaviour-based agent, the architecture of a self-developing device ac- 
cording to the present invention can be schematized by tee interaction of three processes., as illustrated schematically 
in Figure IB, The three processes can be characterized as: a "motivation \ > Drocess and an 

"actuation" process. 

[0040] i - i > i 

2s Tiotc ituatioi A ' civ t i motivations! vattei 'fc'fv - ad i ed with a set of tat-eY 

functions J?. An important feature of th i i c the present inven ti se of moti- 

vation variables whrcf 1 e n ^ _r t c tr e r m 5 f fit. - [ r whose behaviour is being con- 

troiied. Thes . . * - v tit n internal computations based on the b« avior of the two other processes 
{Prediction and Actuation) se© below. The "motivation" process is conducted by the motivation module 11 represented 

so in Rg.2. 

[0041] The computation device 1 B computes values for the motivational variables Mot(t) based on SM(t). Advanta- 
geously, computational device IS is capable of making computations of three kinds: 

a} computations in which the value of Motft) depends only on SM(t), 
35 that is: fttaffi? - f(SM(t}}. 

These wilt be traditional kinds of motivational variables. 

b) computations ihat are "histor i the sense the V i nds uoor SM(t), SM{t-1), etc. 
that is: Mot{t) » f<SWt),SM(f1),SM(t-2l~J 

in other words he values f the motiv t var u>!os depend upon sensory-motor vector values at mom than 
40 one time. 

c) computations thai are ' historic" or "development-dependent" because the manner of computing Mot(t) based 
on SMft) depends upon the history of the computation device and/or the history of one or more other deivces which 
cooperate with the computation device in the calculation of the motivational variables. 

that is: Motft) =.- g(SM{t}), where g changes over time as the computation device (and/or cooperating device) 
4s evolves. 

in this case, it is necessary to have knowledge of parameters infernal to the computation device 15 ana coopering 
devices in order !o determine what is ffte appropriate function gto apply at a given time The nature of these internal 
paramete s d tur f ihe mpalahon device and r ( ng devices < tl ttev scoop 

so erating in a com): rts i «ork, then 1 • - - 1 Vout, tjuire knowledge 

of ihe current values of the weights of the neural network, it one of the devices cooperating in the computation is 
prototype-based, then "type (c)" computations of Mot(t) r , edge of the current sel • otypes, et 

The computation device 15 and devices, sue s i } prediction icdutei wit vi f t i -, pe itta 1 
of mo 4 va t 2 d svi 3s which means that the values ot their Internal parameters will change 

as based on exp* r , t ct Thus, the approf c b¥(t/i • t ver time. 

[0042] The "prediction" process tries to predict the evolution of the sensory-motor trajectories, in other words what 
$M(t) - : liven SM(t-1), $M(t~2}, etc. The pier, n" p sin ~ < r i predictlor c e 12 of Rg 

2. wf - -m-sos use of three prediction devices ded c of M(t), S(t) 
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Mat(t) i t . i 

by these prediction devices. 

[0043] Finally, the "actuation 1 c Ion srould 

ds A s i kfs'raled in F g 2 the "actuation p controller 
10, which can *ob< . --ton centre*. According to she »• nt invention, 

this controller 1 0 implements four functions : 

(a; generation o! candidate motor commands.. 

(b anticipatiof >fthe - , > , t g alction process, i,s prediction mod- 

ule 12), that is prediction of SM{t+t}, SMft+2), etc. from SM(Q, SM(t-1), SM(t-2), etc., based on the expected 
consequences of implementing the candidate meter commands, 

valuation of each si 1 t terms oflhe com i u "motiv** on" 

process) and., eventually, 

(d) selection of the best {i e. most rewarding) motor commands from amongst the candidates. 

[0044] The motivation, prediction and actuation processes evolve based on the experiences of the SOD 1 , as Indi- 
cated by the arrows shown in the circles representing those processes in Fig. 1 B. What the SOD « aware of. what it 
tod for and the way it acts on iia environment change over ti is the result 3 pmental traje ory 

(this evolution ;s indit oundmg the three pro sin Hg. 18 I Lac lie po sesvt 

now be considered in more detail. 

Motivation 

[0045] i i t > roof, 

present ^vontioi m-kcMjn <t ii u i r t,r i u . 

which the SDO is bein ibslract. t imtfona! variables can t 

and development of substantially any sensory-motor device. Moreover, these motivational variables are independent 
of the particular task being performed by the apparatus associated with the SDD, 

[0046] in order to create the condition for an open-ended sensory-motor exploration, motivational variables have 
been chosen whose value depends on the developmental history of the device. This means that the way of receiving 
rewards for such motivations is constantly changing as the device develops. These motivational variables are calculated 
using the computation device 1 5 according to computations of types b) and c) described above. 11 can be advantageous 
for the computation device 15 to cooperate with other components of the SOD 1 in order to calculate the motivational 
variables. 

< if Prel I l '_ _ Ac ■ r _ < I v , i 

[0047] i ■ 

1 according to the presenl even! ion with good results: these are ''Predictability'' Tamtiarity" and "Stability ". However, 
it is )beun 5 1 ttve it is expected that oth r kind 1 motN srtal variables could be 

SM(t) or evolution of the 
interna! parameters of the dovice(s) involved in the computation of the motivational variables), 
[0048] Predictability The "predictability" motivational variable seeks to quantify to what ;> nt si • can predict 
the current sensory context S(t) based on the previous sensory-motor context SM(M). As mentioned above, the SDO 
1 is equipped with a prediction module 1 2 that tries io learn sensory-motor trajectories, if e{SM(t~1),S(t)) is the current 
error for pred , $(t) 5 Predictor based on SM(t~i), oi 3 1 P(t)\s given by 

it wifl be seen thai calculation of the motivational variable P(t) Involves data from the prediction module 1 2 as well as 
npi i he cc vice 15, The ,e ol h inacte a c quen nree wit! thus depend 

upon the evolution of the i et < o< both these components, 

[0049] tan I rit> he mill a t otivat naivar a i what ry- motor transition 

that feeds to S{t) from SM(t-1) is a common pathway. The computation device 15 of the SOD 1 is equipped with a 
- - ; i freq 1 stransitior talis the number of times the sen- t « . tsitton SMflf- 

Sffyh s 1 i r 1 i t mopenod f(t-T)lo t}. If fy{SM{t s tic cm""- . mcyofth r a = t 

- - S(t) iii / tbie F(t) a node 
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F(t)-r T (SM(t-1} t S(t)) 

[0050] Stabwty T"c 'stab i variable seeks to quantity wheth e or not th siy variables* 

s of S(t) ' ' , age value. The cert 01 -it >r device 15 of the SDD 1 tracks the average value <Sj> T for the 

recent period {{f-7) to t}. So, for each sensory variable s, one possible definition for the stability o, ft; is given by: 

[0051] i , o , „t , 

and the "Stability" n iivalionai variable is computed isii j< >n >t;tal:ons. j • (b ies.i I bov i'm "Familiarity" 
notsva nal vanaoi mbt aieuteted usin; amputations itta kid j } ependinc 

upon whe cy of ccurrence of a giv 1 - t i evaluated evet he whoit lifeti e or opetation of 

» the device or over a shorter period, and on whether or not the length of this period is adaptive. 

Reward oj notions 

[0052] Each motivational variable v is associated with a reward function r{v,t). it takes the general form: 

so 

rWJ-fjMO, v(M),v(t-:?}. .) 

in other words, the value of the rewai d ears depend upon one. two or a series of successive values oi the motivational 
55 variable v 

[0OS3| *rt the preferred embodiments of the present invention four kinds of reward functions can be used, r^^f.t), 
Wtft foftO and WW 

[0054] r ma Jy,t) (0! r^fv.f j.-W I i unction, the device is rewarded when it maxi ? < 

mfces.) the value v of thi ss ble. This is similsr to tl way in wh i no 

so are generally treated (e.g homeostatic models in "Designing sociable robots" by C. Breazeal, Bradford book - MAT. 
Press, 2002). 

35 

From this definition of r max (v.l> it follows that the reward Is maximized when the value of the motivational variable is 

v vauaole one could use the 

definition r TOajl (V.t) -l-vfi).) 

[0055] r jnc {v,{) {or r&xfi/M t ic i rsi gt rewa t unction the ievi est >r ^ r^vj), 

*o to maxin t 1 t • «j of maximizing (or roirtir teg) the variable 

itself. In this case :t can be considered that. 

when the SDD seeks to maximize increase? in the "predictability" motivational variable it is seeking "learning' 
experiences, that is, experien - , - 3 the env ron meet 

45 - when the SDD seeks to maximize increases In the "familiarity" motivational variable it is seeking "discoveries", and 
when the SDD seeks to maximize increases n 1 ie -d i v motivational „ -tt-l it is seeking strategies thai lead 
to a better con) rot of its environment 

As shall be seen, this reward function r, ns ( v t) is Important wish regard to the dynamics of the system, 
so [0058] The rewa 1 J follows 

Uv(t)~v{t-1)}: v(t)>v(t-1) 
53 [0 : v(t-1)>v(t) 

[0057] llf Imncfen tot I t eater the increase in the tit < 
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Hb '• g the reward generated by H award fund 
[0058] When the sell - !e\ so explores a new behaviour based on r it> Jv,t). n t - , be relatively 

larg s in history predictability P(t} t iamiiianty F(t}, and 

ft). Thus, initially the reward • r ; ,, s (V,t) will produce a large reward associal w « vicur. However, 

5 as the self-developing device becomes better acquainted with the "new" behaviour (which equates to gaining knowl- 
edge of its environment) the values of these motivational variables will change by smaller and smaller amounts. Thus, 
the reward function f ine {v,t)\'uM a s nailer reward fot adopting this behaviour. By using trie reward function 

r mr {vt) a i > >st dent motivational variables, the self-developing device is driven to explore 

w beha» i a ■ 1 1 e < been mas) 1 a . hers 

io [0059] The effects of r m Jv,t) and r aec (v,t) are not symmetrical. To some extent r^vf) achieves a similar result to 
a rewai 1 > 1 as I s tinimiite an associated motivational variabl* 1 " r //K: Cv,y has a function which 

i r> j rnilar r ward functi king t laximi ji socia rnoiivatior nabie Wl tssocial 
with th« pr s. ' bies P(t), P(t) a off r aec (v,t) ' y ehave h * mar iei vhich 

will lead to an increase in prediction error, into situations which are less and less familiar and situations which are 

« ; more an le.W s Jevice 1 is in a safe s an i i nt e at r dc Jv,t) witl P(t}, F(t) 

and off) can provi Je go >c *am;i • sfi . g - 

[0060] The properties of the fear reward functions can be summarized: as follows: 

'maxfWP des cor do c sing the sensory-mot tfeviee to do what it already does best. 

so r ineW) drives the system to explore unknown situatk s isl »g r own ones in order to progress. 

r^Jv^} and r&Jvl) cause novelty-driven exploration directing the system tc progress towards what it does not 
yet master {r mjn (v,t}) or towards what seems to be the most difficult to master (r^fvj)). These strategies do not 
involve undue risk if the sensory-motor device is in a safe environment. 

Calculating an "Overall Reward 1, 

[0061] As will be seen from the specific embodiments discussed below, in many applications it is sufficient for the 
self-developing device to evaluate the reward associated with a given behaviour (that is, associated with taking a 
candidate action mi) by using a single reward function based on a single motivational variable. 
so [0062| However, it is also possible to make use of two or more reward functions, for example based on respective 
different motivational variables (although it is also possible to use different reward functions based on a common 
motivational variable - e g. the motivational system could seek to optimize r m3 JP(t)) and r lnc (P(t)) thus arriving at a 
compromise between exploration and conservative strategies). 

[0063] in a case where t yst m uses two or more reward fun 1 - * tesirat \ 

35 of a given candidate behaviour, the controller 10 of the SDD 1 must consider the overall reward PM(t) that would be 
obtained as a result of this behaviour taking into account lias rewanti functions associated with all tne applicable mo- 
tivational variables. Maf(i) ' i . n->t -.«>.» w< ««r 
«i enables a relative weight to be assigned to each motivational variable when determining the overall reward of vector 

40 

R(M(t))= £ Oi-MmoW) 

4$ [0064| The weights eg can be preset by the designer of the SOD 1. Alternatively, for greater autonomy/task-inde- 
pendence, the weights can be determined automatically by a weight-setting device (not shown) which is used In con- 
junction with the SOD 1 The automatic weight-setting device will typically implement known machine-learning tech- 
niques unci select: weights a order to maximize an independent "fitness function", ft automatic weight-setting device 
could be t sing a *udher SDD. 

redic ; 

[0065] he ess o levies comes from its ability to predict so isory motor tr ajec j mg a sit 

aation is recognizing a sensot t > s * j t consider that 

c j ;:<■ see. for example, "La construction tiu reel chez 1'enfanf by J. 

Piaget, pub, Delaohaux & Hieslte, Neuchatel & Paris, 1937; "The tree of knowledge: the biological roots of human 
a & F </are!a pub Shambhala.. USA. 1992; "A sensory-motor account of vision and 
visual conscio sn _ _ Regan and A Noe mBehavouM a rj 
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> " 1 v. pub i a Jacob Paris Frace, 1997. 

[00661 This view, aiso known as active perception, is now shared by a growing number of robotic engineers (see. 
> A v siot y D . Ma roc and I 

to*FromA iais t.p i Bettor vision through manipulation" by 6. Me * trick, op. est.) 

5 [0067] At a given time t, the self-developing device of the present invention p sensory-motor 

context that can be summarized in a vector $M(t). As mentioned above, the preferred embodiments of the present 

invention use three pn ' d s n r , l'i 5 . n mot The three devices sake the current situation SM(t)&& an input 

and try to predict, respectively, the future motor situation M(M} : the tutu e s r b<t ' i d the tuture 

state of the motivation vector Mot(M). 
-io [0068] At each time step, the hreede ces ea c - rent situation with the 

previous one. 

tl m f,SMft-1)) --> M(t) 

is 

n s (SM(t-i)> ~» s(t) 



,, c , n mat {SM{M))-+Mot{0 

where -» indicates a comparison. f"f, r , (SMo -1}) is the prediction of M(t) made by prediction device n m based on (SM 
(ft, ii SV t 1 uik 1 iKi-i h > i'atminu.l! -w d < n ) ot i It ^Mtf D) is * 

prediction of Mot(t) made by prediction device TT mo , based on (SM(t-1)). 
us [0069] The landscape of the motivation that JT n ,-, ; must iaarn is dependent en the performance of the two other 
prediction devices. The motivations! variable Pff.f is determined by the error rate of F5 S , and the other motivational 
variables change according to the action selection process which in turn results from the prediction of n„ and \\ {see 
below). As a consequence, n mol must adapt continuously during the bootstrapping process. 
[0070] The prediction devices can be implemented in different manners, for instance: 

using a recurrent Elman neural network with a hidden layer /context layer (see "Finding structure In time" by J.L. 
Elman from Cognitive Science, 14:1 79-211,1990). Because this network is recurrent it predicts its output based 
on the value of the sensory-motor vectors several time steps before t. 

i ii and extiapoiales s he res >• 

35 regions, ft takes the form of a set of vectors associating a static sensory-motor context SM(M/wim the predicted 

vector {M(t)£{t) or Mot(t)}. New prototypes are regularly learned in order to covet most of the sensory-motor 
sp-*,t rbepred /cc n io nmg the results of the if closest prototypes. This prediction system is faster 

and more adaptive than the Elman network butmayprov be less eftir k tot t -c 

Sories. 

40 The general architecture according to the preferred embodiments of the present invention can be used re- 

gardless of the kind of devices that are employed in the prediction module 1 2. Thus the prediction devices it can 
be implemented using iva tyo * th mteehni ues of he than in cities ied no How- 

ever, it is desirable thai the selected [ J idev es ba\ : r rmancs in order to ensure efficient teaming 
for the system as a whote. 

45 

Actuation 

[0071] The actuation f > bte evolutions of t! sensory ioto 

trajectories and toes to choose the loctoi commands that should lead to the maximum reward Several techniques 
so taken from reinforc sbtems - see. for example, "Rein- 

forcement Seaming; A survey" by L P. Kaefeting et ai, op. cit in the system according to the preferred embodiment of 
the present invention, the process can be separated Into four phases: 

Gen ration • The r cts a set of c nddtcmlomt in jnds {mi> For s » > > 

ss o a t yjjj be required v v i * A 

of a simple case, if the current value of an actuator control signal, is 0.7 then the controller 10 may randomly 

shift the current value so as to produce candidate values such as 0.55. 0.67, 0.3, 0.75, for ntg. 

Anticipat on 8v using she p«edict«on devices in a recurrent manner the self-developing device 1 simulates the 
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sensorimotor e i « • > , r SW m ,Hha!eari oe expected to 4 tee or ag 1 1 orcon-sroaods 
over T time steps. The system combines the result of both tt m and n» to predict future sensory-motor situations 
and uses n^t c lion of then tlvaiion vector Mot(t). 

Evaluation For cache 1 {SM mi } an expected reward ft mi he sum of all the future 

rewards expected to arise during the T lime steps. 



Prfdriion . k m y > ti i-t f ifievalueofT s i ) c,t if t\pt tl | *i < " i» !lni r^ng,' of - 

10 10, 

Selection : The motor command {mi} corresponding to the highest ff mj is chosen for output by the controller 10. 
in other words, the behaviour of the sensory -motor device 2 associated with the SOD 1 w«i beconiroiied according 
to the candidate command signals mi giving she greatest toward 

[0072] in order to evaluate to what extent the self-developing device of the present Invention is capable of open- 
ended learnir - t d above w Impfem nted two nbodin nts wbict 5 j > 

below. 

[00731 

system The system was intended to learn So track a moving light. 

[0074] In the first months of their life, babies develop, almost from scraich sensory -n < r 

them to localize lights sources, pay attention to movement and track moving objects (see "Understanding children's 
development" by P. Smith at at pub. BlackweH. 1998). The embodiment presented here does not attempt to mode! 
precisely this developmental pathway but to illustrate how general motivational principles can drive the bootstrapping 
of corresponding competences. 

[0075] The AIBO ER8-210, a four-legged robot produced by Sony Corporation, is equipped with a CCD camera and 
can turn its head in the pan and tilt directions (a third degree of liberty exists but is not exploited in this experiment) - 
see, "Development of an autonomous quadruped robot for robot entertainment" by M. Fujita et h. Kitano, in "Autono- 
mous Robots", 5: 7-20,! 9»a. In the present embodiment, the vision system of the AIBO ERS-210 was simplified to an 
extreme point and a setf -developing device according to the invention was. implemented in software in order to process 
sensory date provided by this vision system and to direct motet centre; of the pointing of toe vision system. 
[0076] The robot extracts from each image it analyses the point 0; maximum intensity. The visual system perceives 
only the coordinates of this maximum (i ( t P an>tdtu<) expressed relative to the image center. The robot also perceives the 
position of its head in a pan-tilt coordinate system {h^hf}^. At each time step ifs perception can be summarized by 
a vector Sft} having four dimensions. 



idprm{t) 



hltiliii) 



hpan{t) 



htilt(t) 



[0077] The robot moves ns head according to motor commands {m dpm , m^, 
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[0078] So, the sensory-motor vector SAJff) at each time step has 6 dimensions. 



mdpan{t) 



mdtitiit) 



idpanii) 



idiili(t) 



hpanii) 



htiit(f) 



[0079] Initially the SDO does not know anything about the sensory-motor device {hero, a robot) wish which it is as- 
„ with he St) N i 

gaze on a certain number of things in its environment? To do this, it must discover the stmcttite r* - , 
in its sensory-motor dovico II must discover, notably: 

How a relative command (m dpan> affects the next poshion (f> par ,h m ) of • he head (This sensoty-moJor cou - 

pln^MOHni ^1,1 .h ; f pi lion (Mi 1 3 f' mt t< -t' i i'i "f hi- ur-Abody) 

How a relative command (m^p^ affects the movement of the visua field, in particular the position of (i dpm , 

'dim)- (This sensory-mot- coist-a nod by the robot's body and also by the structure of what 
happen* in the environment.) 

{0088] In short, the robot must learn to perceive its environment by moving its head in the right manner. 
[0081] A number of different motivational variables and associated reward functions could be defined, in accordance 
with the present invention, in an attempt to provide the robot with tne ability to learn the desired tracking behaviour, 
For example, it could be contemplated to make use of reward functions based on any or ail of the following motivational 
variables: the predictability variable P(t), the familiarity Fff/and four stability variables (one for each sensory-motor 
variable), This yields a possible motivational vector Mot(t} having 6 dimensions: 



/>(/) 



F(t) 



Oidpan(t) 



Mot(t) 



<juitik{t) 



dhpan(t) 



ahtilt(t) 



Simulated environment 



[0082] fn order better to understand the role of 

c )i aser,es of o<pc in ; 
a sinusoidal movemen i th environment was: 



simulate 



iternal motivation in determining the development of the robot's 
mple simulated environment rhepresera s ghtperforming 
ted. according to the following relationships: 



iight pan (tHK*s!n<p<t)} 
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!igh m {thL*sin(p(t)*ii) 



5 p(tfl) = p(t)+8 

where 6 a m ill ir reme i L ths magnitude fthe oscillations in the t domain. K thenr < t 

in the pan domain, and 3 - the phase oifference between ths oscillations ;n the pan and tilt domains. Tne oscillations 
in tne lilt domain have a smaller amplitude; that! in the pan domain {i.e. L< K). 
io [0093] The room perceives :he relative position of the light compared to =ts own position. 



[0084] At each time step,, the SDD associated with the robot decides the most appropriate action f/n tfps „, m mm } to 
perform. The effect ot this action is simulated using the following simple rules : 

20 



1 0085] 

max pa(V min pat}! max, jS , min tiR . 



f maxpan 
h P an(t+1) = J rnin pan 

g P an(t+D 



g P an<t+1}>max pan 
Q P an(t + 1}<m!n pan 
otherwise 



[0086] Ait ) ihon i h ul /t+f), 

io incto ' : njoj dj t *si y 

[0087] in a first experiment., the robot was driven using a reward function based only on its "predictability" motivational 
variable. More particularly, the self-developing device driving the robot's behaviour made use of a reward function 
which &m s it - tabititj evet Pfti n gnit de of the reward being proportional to the sfee of 

45 the increase in P{t). in effect, this means thai the robot seeks for "learning" situations. As it Seams, sonsory-moto? 
trajectories that used to give rewards tend to be less interesting. These dynamics push the robot towards an open- 
ended dynamic of exploration. 

[0088] t , i of the average predictability level P(t) during this experiment. it quickly reaches 

a high value. This shows that the robot tor, more precisely its. 3DD) has learned the overall effect of movement on the 
so fight's position and on the posttion of its own head. As the robot tries tc experienct reases *ioi \ nd no 

sirnph, on * it. s " c 'or he maximum value Theycc reap nd to ' sensoty-motos 

trajectories that tie robot explores. 

[0089] Figure 4 shows the evolution of tt pafj , the pan position of the head, outing 1000 time steps ot the above 
described experiment (the corresponding evoiuiio >ight par J i by the si rial curve n Flg.4). A very 

as Simitar curve can be plotted for the tilt dimension The movement is rather complex as the robot gats away from pre- 
id tries to explore new ones, ft is iniore^ - i the movement is not 

completely decorrelated from the movement of the light as hn i b t explores differen [ 
commands . The evolution of he h 3 > \ in 
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the possible pan positions ay osc(sistt=ng around the zero position 
Increas n famifiari^ 

5 [0090] in a second axpe igarewardfunction&as8dorilyonj]s' : faniil:siity^Tioti\.'si:!ona! 

variable. More particularly the seif-tieveioping device driving the robot's behaviour made use of a reward function 

which rewards increases to the familiarity level F{t), the magnitude of the reward being proportional to the size of the 
r in it -ii ;h i k >ui Uutitv imfamilis t aterawhile 

and, as a consequence, iess rewarding. These dynamics drive the robot into acontinuoos exploration behavior. 
io [0091] Figure 5 shows the evolution of the average familiarity ievel F(t) dm ent The robot manacies 

progressively io reach a vary high level of familiarity. Similarly to the evolution seen during the previous experiment. 

we see oscillahaiis doe to the pressure of experiencing .vtereases in familiarity. Each reduction ef the familiarity tevei 

i o ^ sp -i r * v i f the sensory-mot > * 

[009! } l * i j j resent experimer 

»s The movement looks a b>t like the one obtained in the first experiment, but some differences can be noticed. The 
average position curve shows the robot has first tended So explore positions corresponding to high pan values then 
, t This switch, that seem to occur independently of the oscillation of the iight, 

did not appear as dearly for the experiment on predictability. The fam ly mot 1 >n coshes he -hot to explore 
trajectories in the sensory-motor space independently of how wet! it masters them. Employing reward functions using 
i'o the familiarity and predictability motivations can be seen as two complementary ways for tns SOD to get the sensory- 
motor device to explore different sensory-motor trajectories. 

Maximization of s ors, , , , 

[0093] Athiuitd xperin -we •. > j v t < I 1 tivational variables 

concerning the stability of each component of the sensory vector S(t). They were all associated with the maximize 
reward function r^, 

Head stability 

[0094] First of all the case was considered where the stability concerns the head position. T his corresponds to the 
variables cst, pat ,(t) and <$ mm (t). The self-developing device driving the robot's behaviour employs a reward function 
which ensures that the robot seeks sensory-motor trajectories in which Its head position, in pan and tilt, remains stable 
In time. Figure 7 shows the evolution of average stability during an experiment in which the robot uses this reward 
35 sy- ere It linn t-xltoM r- I i< tcl i| b t s tn l > t tor uv ! tt in- i t , i vrr g is head in 

order to obtain significant rewards. Stability is teacher; rapidly for both the pan and till direction 
[0095] The evoluflon of head pan position h psn . during this experiment Is graphed in Fig. 3 The evolution observed 
in Figure 8 shows that the head position stabilizes around its initial position after a short period of oscillation. {The 
evoh r >r c itght^., > ' 

40 

light stability 

[0096] Next the case was considered where rewards were assoo edwltt laxm >ility io )dpsri . o :tsm ) of 

the relative position of the perce . his case the task is a bit more complex a thoirgh tire y controlled 
4$ by the robot. The robot has to discover that it can act upon the relative position of the light by moving its head in me 
appropriate directions. 

[0097] Figure 9 illustrates fhs evoluflon cf the average stability of <y h!p:m ana og aW . during en experiment using this 

^m t to I ' 1 1 mage , mtrol f sfa )tltto id t 1 t light in th Hi! Jomain fa < tt t n the pa iornaii 
probably because the movement has a smaller amplitude m the till domain (L<K). 
s*J [0098] Figure 1 0 illustrates the evolution of the head pan position, f» paor < Artsr & short 

time fortuning, the rebel develops a tracking behavior and follows the light suite precisely As the robot seeks sensory 
stability, each movement of ths tight can be seen as a perturbation th it i le irns omp h to n n o 

this visual know-how results directly from the effect of the environment on the sensory-motor device {The evolution 
of lights is also indicated via the sinusoidal wave shown in Fig. 10). 

[0099] With this series of experiments, we have a clearer idea of the effect of each reward system on the oootstrapping 
process. The two first motivation*, increase in predictability and familiarity, push the robot to explore its sensory-motor 
device. The last feu maximiza tot sensory si i llhei be on he one hand, to stop moving its head. and. 
on the etror hand c oehavior. 
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tvxpahrneni on ;hgj robot 



[0100] A further experiment was conducted on an AISO ERS-21Q (shown to figure 11). The software components 
fot this e> , •-'•tfen in C++ using the publicly available OPEN R?: >ard the A BO 

5 ERS-210. and the data generated during the experiment were directly written on a rVtemotySttok for later analysts. 
[0101] At each time step the robot computes the point of maximum fight intensity in its visual field. The relative 
tnis posnt provides ho two inputs i apan (t) and i^t). The robot measures its own head position h^t) and 
h till (t}, Unfike the case during the simulation experiments discussed above, mis measurement ss not completely accu- 
rate. In the same way. due to different mechanical constraints, the relative movement resulting from the action m tfpao 

io (t) and m m ^t) can be rather noisy. 

[0102] The reward system used could potent tatty nave included all six of the motivation.* variables previously studied. 

mentioned tin in a case wt t p ( r w i hi t sod lati ; f ch vart; bf< in the 

i j in i"' hf >v r - I t,' kc sen y de-ten ?c by the set of parameters eg. 

[0103} For the present experiment, these weights a, were set so that the robot developed the know -how for paying 
is attention to tii jiff i i it environment. This means ts ,t i k ... ■ king behavior bus 

also an exploratory skiff for not being stuck in front a given tight. 

[0104} As head stability is to some extent counterproductive for such a goal, it was decided that <3 hps Jt} and a hm 
ft) should not be used as motivational variables in this experiment. As a consequence, alt the reward functions were 
associated with the same weight a t ~ k, except the two controlling the head stability {which received the value otj « 0). 
20 [0105] The experiment lasted id minutes The robot was placed in front of an uncontrolled office setting. Figure 12 
shows the evolution of the six motivational variables during this experiment As expected., the four motivational variables 
associated with the weight k. obtained hign values. The relative position of the light rapidly reached a plateau, but 
predictability and familiarity kept increasing. Thst motivational variables for head si anility oscillated at a lower level. 
[0106] Figure 53 shows the evolution of freed pan position during the experiment, «s well as the position of the 
p« carved light. The robot ems to f 1 t I t > ttlon, its position o cillairJ 

around a local light maximum permitting 'the robot to find another local light maximum 

[0107] This behavior can be seen more clearly on Figure 14 which magnifies a detail from Figure 13. The head pan 
position increases so as to approach a local maximum, then oscillates around if for a white. At some point a larger 
oscillation makes the robot discover a higher local maximum. The robot switches back and forth several times between 
so the two maxima and finally continues its exploration towards higher pan values. This kind of behavior is a typical result 
of the search to Increase predictability and familiarity. The robot uses familiar and predictable contexts as bases for 
progressively continuing its exploration. 

[01 08] Figure 1 5 illustrates the overall pan-tilt trajectory lor the duration of the present experiment. It appears that 
the robot has concentrated its exploration on the righthand pad of the scene. It seemed to have highly explored one 
35 partk i ireaandpr jmssively sea e - for >th< m xirn < it its rnt « diet neighbi n ft s i era « yields a 
kind of "i ap .if 1 posit >r a high v>n j »s shewn on Figure 15. This map can be used to characterise 
the place where the robot stands. This representation does not exist as such for the robot but is the result of the know- 
how it developed with its sensory-motor device. The robot is not capable of perceiving ail these light positions at the 
same time, but it is den M cquir d sensory-motor vi kn how) that 1 r there. 



me rtrertx Ar a ' ku'epr j ct.^q / ten 



[01 09] 

that Is, a vision system which concentrates on the active pans of a scene <w < je 

This system develops the capability of recognizing visual situations, through sensory-motor exploration. This embod- 
iment is inspired by research in developmental psychology about attention and eye movements - see "Eye movements 
and vision" by A L. Varbus. pub. Plenum Press Mew York, 1967: "Animate vision" by D Ballard, from Artificial intelli- 
gence. 48:57-86. 1 $91 and "Control of selective percept ion using Bayes nets and decision theory* by R.D. Rimey and 
CM. Btown, fran International Journal of Computer Vision. 12{2):1 73-207,1 894. 

[0110] The vision system described here shares some similarity with an active vision systetn described by Marocco 
and Roreano (see "An evolutionary active-vision system" by T. Kato and D. Florence. Proc. Of the congress on evo- 
lutionary computation (CEC01), IEEE Press, 200 1 and "Active vision and fet a noetic r • ■ maty behavioral 
systems" by Q Marocco and O Roreano, op. eft.). However, the latter uses an evo ufion v, parad gm in order- 

to evolve the desired behaviour: populations of robots are evolved and the best individuals are selected according to 
a predefined fitness function - see "Evolutionary Robotics: biology Intelligence and technology of sett- organizing ma- 
chines" by s. Noifi and O. Roreano, pub. MIT Press, Cambridge, Ma„ USA, 2000. By way of contrast, the system 

>rding I c ve io arnve at me dosnod bebavicur. 

[01 1 1 ] In ve oping device was used to drive the behaviour of a system equipped 
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1 apt RkRiyo-. ii < ii 

of the retina relative to the image can be changed, and the retina can zoom m and out Based on the zooming factor, 
the retina averages the color of the image in order to produce a smgie value for each cell of the retina. With such a 
system, it is possible to rapidly scan the pattern present in the ever, - nd zc Hope /e seme details 
5 more accurately. 

[01 1 2| in the present embodiment, in order to become an active vision system the system has to learn how to "act" 
on the image by m a r .• , , eg the ret ina in order to get a higher reward as defined by its reward system. 
[01133 More precisely, for a given image tt < ensoiy vector S nta he iom 

value oJ the R x R pixels of the retina. 



5(0 = 



PtXi,y(t) 
«W,l(0 

PixjtXi) 

no 

Z(0 



£0114| The motor vector Bff# contains the values for the three possible actions the retina can performed .changing 
the x and y values and the zooming factor. 



M(t) = IDx(t) Dy(t) Dz(t)i 

[0115] As for the previous embodiment, the seif-developing device does not have any prior knowledge about the 
sensory-motor device witn wnich it is associated, it mast: discover the structure of several couplings in this sensory- 
33 motor device, notably ii must learn to: 

- understand the effects of Dx(t), Dy(t), Qz(t) on X(t+1), Y(M) and Z(t+1). This is a context-independent sensory- 
motor mastery, yet the constraints linking the $ variables are rather complex, in particular due to the boundaries 
of the visual fields. 

40 - un- 1 * ween Dx(t), Dy(t), Dz(t) and the values of the pixels on the retina. This depends 

on the particular images the system is exploring, but she system must be capable of discovering sensory-motor 
trajectories that should be applicable in different contexts. 

[01 16] in this embodiment a reward function was used based only on the predictably motivational variable, P(t). It 
4$ can be considered that there was a motivational vector of dimension 1 (corresponding only to the predictably variable 



Mot® ~!P{t)l 

ita Its . minimizing Predict y_ 

[0117] in this experiment, the sets-developing device used a reward functionary t \ gned a higher reward 

when the t » ^f> ecreased in other woros th device seeks t minimize the vak f he prec 

ss ability variable. Th;s means that the device tries to expsore sensory-motor pathways that ft masters the least. As it 
explores tt - *(l c i leads the system to , i .s oehavior As>milai result could 

have beet] o m reward -unction r dec (v,ty 

[0118] Tht . , made use of a sequence of 200 image frames of a person talking, recorded using a video 
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cameta. Dunng this sequence of images, ihe petson's bead, mouth, eyes and hands moved This sequence of SCO 
frames was usee ss , s he behavior of the retina for a sequence 

of si < rages at give ^ times in its rt The sequence of images in Rg.16 runs from top-left to bottom-right 

Values ot tt e shown tn Rg 16 using ssmitransparenf grayscale levels, 

5 [0119] As can be seen from Rg.16. the retina starts with a iow zooming factor and pro y sss „ eiy focuses on the 
upper part of head of the person. The square seen at the left upper part of the image reflects tho value of P(t} [bteck 
low predictability, white nign predictability). 

[0120] nc system was a! „» _ ng 2000 time steps (1 0 cycles of the input video sequence). Figure 

1 7 shows the evolution of the average predictability *. P(t) > dunng this pence. 
io [0121] Despite the fact th t i s why the curve 

regularly drops), ihe average predictability is increasing in the iong run. This means that the system manages to be 

r ages 

[0122] £ , 

men; phase and the end of the experiment, repsectiveiy. Both trajectories correspond to the same number of rime steps 
» but the first one is much mor > id than t )t r. At the beginning o he experiment, the retina was scanning 

the whole image in a rather uniform manner At the end of the experiment it seemed to concentrate on the moving 
parts (hands and heads) because they proved to be the most difficult to predict. The retina was not preprogrammed 
to pa> Ken on c mo ft liiiy e nerged d en by the syst iy u - 

[0123} Mote precisely, Figure 20 shows the time the retina spent in tire "face area" during thss experiment. The "face 
i'o area" was manually defmed tor each image of the person talking, and a note was made of the number of time steps 
in which the r tin » sinth txon j ^ _ c s s tp a e system focuses 

its attention on the face. 

[0124] This second embodiment shows how a seif-deve oping retina can auton < > ' >f ' 
focusing its attention on "interesting" aspects of its environment. The device cs motivated by a general kino of "cariosity" 
2$ Although in this experiment ftie system was illustrated using video images of a person talking. Ihe system is nor spe- 
cialized for this kind of stimuli . 

[0125] it wtit be seen from ihe above-described embodiments and experiments that the self-developing device ar- 
chitecture according to the present invention does indeed enable open-ended learning in different applications. 



30 First order, second order and third order couplings 



[0126] 

as it masters new sensory-motor know-how. This process can be viewed as the establishment of couplings of different 
kinds. Three Kinds of couplings can be identified - first order coupling, second-order coup c horde coupiin 4 
- although they all rely on slmiiai bootstrap; 

[0127] F t I j J t Inter wilt t t r Rg. 21 

which the circular symbol represents an SDO and the spiral symbol represents the environment). By conducting sen- 
sory-motor experiments the de ec * j \ I f object, and contextual knowl- 
edge about v r immersed in a continue nation ar 
does not know how to act open it in order to turn it into a structure 1 vt ie abc 3ed experiments show 
how it can develop such a sensory-motor know-how. 

[0128] Second-order couplings are couplings between self-developing devices and concern the development of co- 
ordinated interactions like jotm attention or simple communicative behavior {see Rg.22). if two self-developing devices 
share the same environment their behavior will have an effect on one another . they will engage in coupled develop- 
mental pathways. 

[0129] Third-order couplings concern the coordination of second-order couplings to form a higher form of interaction 
(see rig, 23). Examples 01 such couplings are complex dances, run taking or linguistic behaviors, it two self-developing 
devices have developed a sufficiently good interactional know-how they can use this shared mastery to engage in 
more complex interaction which presupposes an initial repedou e of common interaction patterns 
[0130] The kind of se » ng rs described '1 - i f these three of cou 

plings. Devices can be envisioned which will go through developmental pathways that would include the development 
o uch complex competence e h new mast building 1 1 t 1 eopnertof 

compete em is iiii 1 ' - toaiiy n Figure 24 

[0131] sel 3 devic , 1 ' /sde a general solution t rge number* 

>pieg problems "he motivation system that drives the develop- c -- * endsnt of the 

mature of the sensory-motor apparatus to be controlled. For this reason, the same developmental engine can be used 
, a , t. isory-mosor device T - / in 

i scuts so inn any other kind of devices foi exat ipl ■ • on putei games, new musical instruments 
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\ s . t i , « ive houses interne software agents 

[01 32] The skilled person was readily appreciate, based on his common general knowledge and the contents of the 
■ components and so ware routines to «, sand 

functions described above. Accordingly, further details are not given here. Moreover, it will be understood that the 
: ■ eg of functions between distinct devices as described above with r#h * \ > tj IB 
im : ify U < v.:' : -scrcn Functions described as bo=nq performed by one specific device, for example, mo controller 10. 
rsaymp-a i by a eren module d r\ ndmq up< n t) on Furthermore, the self-devel- 

oping devsce cart be trnplemented using modules which combine functions which are ascribed above to separate com- 
ponents. Finaiiy, various of She functions may be partia o mltrei> p a ^rtcd In software 
[0133] Although the present invention has been described above with reference to eertsn prefe'rea embc 
thereof, it is to be understood that the invention is not limited by reference lo the specific features of those preferred 
embodiments. 
[0134] in particutar, v 

i - I specified. 

[0135] furthermore, although the preferred embodiments make use only of n i ional va -totes of the preferred 
kinds twhtch « s 1 it of any particular sens < orapp. > and hist tiiy-de lent) t is to be under- 
stood that the SOD of the invention could make use of a mix of motivational variables, some being motivational variables 
ot the above preferred kind and others being mere conventional, task- dependent motivational variables. 
[0138] In a similar way. although the above-described use of four 

preferred types of reward function. It is to be understood that other kinds of reward functions can be used in addition 



Claims 

1. A sett dt < gd ce comprij ig 

input means for determining the value of a set of one or more sensory-motor variables ($(t)) representative 
of the status of the environment; 

control means (10) for oulputting a set of one or more control sign o control action of a 

sensory-motor apparatus with which the sett-developing device is associated in use: 
a motivation module (1 1 ) for calculating a reward (R(v,tf) associated with a candidate value that can be taken 
by said set of control signals {M(t}); and 

selection means {10) for deciding, based on reward values calculated by the motivation module {11}, which 
value should do tatcen by said set of coatrol signals (M(t)), the selection means controlling 'he control means 
So output the selected value; 

wherein the motivation module (11) is adapted to evaluate rewards by calculating a function {#fV.*}) of at 
least one motivational variable (v) whose value is derived from at least one of the sensory-motor variables of said 

set (mam 

characterized in that the motivation module (11} uses a computation device (15) adapted to perform a 
history-dependent calculation to calculate the value of said at least one motivational variable (v). said history- 
dependent calculation being dependent upon at least one of: 

a) one or more time-varying internal parameters of the computation device {IS) or of a device cooperating 
with the computation device in the computation of the at teas; one motivational variable (v), and, 

b) values ($M(t),SM(t-1)) taken at different times by said at least one sensory-motor variable of said set. 

2. II ^ \ ! t vice, f it 1 Hi uprising prediction n s {12) fo >r J 1 r j ha value o 1 < 

5(t))i i (fi t > < /-motor v 
(SM(t~1)) at an earlier time (M); 

wherein the motivation module {11} is adapted to evaluate rewards by calculating a function f}(v.t)o\ a pre- 

' u i (Pi, i ■ i ' prediction me 

{12}, 

3. \ s 5 <. r i j rig t lit wherein h 'alt of the p »l a {P{t}) 
at said first time (t) is ca ct . g to the following equation; 
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where e(SM{t- 1 }.S(i)) is the prediction error of the prediction means (1 2) when predicting the value of said at least 
one sensory-m< f„tv x i t \P ed on i value of said set o? sensory-motor 

variables {SM(t~1)} at an earlier time (t~1). 

4, A self-developing device according to claim 2 or 3 : wherein the prediction means (12) comprises at least one 
device > l c ' < group comprising a recurrent £!ma netw ■> J - ' ' J 

based prediction system. 

5, The self-developing device of claim 1 : Kind comprising frequency-checking means for determining the fregaemey 
of a from a first value of the set of sens I f>" t ) a I 
sensory-motor variables (SftJ); 

wherein the motivation module (11 } is adapted to evaluate rewards by calculating a function R(v,t) of a fa- 
miliarity motive 'F(tj) h sai i lime (f). the familiarit moti a 5 (F^>j being indicative 
of the frequency, as determined by the frequency-checking means, of the transition to the value {S(t}} of the set 
of sensory -motor variables at said first iime {f> from the preceding value (SM(t-f)) of the set of sensory-motor 
variables at said earlier time (t-1). 

6, The self-developing device of claim 5, wherein the value of the familiarity motivational variable (F(t)) at said first 
time (t) is calculated according to the following equation: 

F{t)»f T (SMp-1),S{t)), 

where f T (SM(t-1),S(i)), is the number of times, during a time period of firs: length (T) preceding said firs! time (t), 
that the transition has occurred to the value {$(tft of the set of sensory -motor variables at said first time (Q from 
the preceding value ($M(t-1)) of the set of sensory motor variables 

7, The self-developing device of claim 1 , and comprising averaging means for determining the average (<t i > r ) of 
one of said sensory-motor variables {%) over an interval of said first length {J); 

wherein the motivation module (11) is adapted to evaluate rewards by calculating a function R( v,t)ot a stability 
motivational varia ' if 1 ! cative of how close is ihe value cf said sen; / i \ -triable^) at said first time 
ft) So the average ■■rtsp-m over a preceding interval of said first length (7) 

8, The self-developing device of claim 7 wbereir t <• vatu< f:l jj tea variable (a ; (t)) at - J 

\e{t) ned ; fmg i wing equatk 

0j«-1 W{<«, (»>-<«(> ji 2 ), 
where s, ft) is she value of said sensory-motor variable at said first time 

9, The self-developing device of any one of claims f to 8 t wherein the motivation module (11} is adapted to apply a 
reward function (r^jvj)} which generates a eward whi tt maxim t t vafue of ( i iast one motrVa 
variable (v). 

10, The soil ^ , ice of ny one of claims 1 to 8 ; wherein the motivation module (11 ) Is adapted to apply a 

t dfunc (r m i„(* f '> "- 1 jenerat ev ' >■ h minimizes t * > f j a ; e >tiv 
variable (v). 

11 The sow v vice of any one of claims 1 te 8 wherein the motivation module (11} « adapted to apply a 

reward function (r rnc (v,t}) which generates a reward which maximizes the increase in saks at least one motivational 
variable (v). 

1 2, The self ' > . one of claims 1 to 8, wherein c apply a 

reward i (t t)) I jenerates a reward which maximize 
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variable (v) 

13. The set' * ce of any previous claim, wherein the no wo or 
more » (r mc-f ! ' r (riof • 

on a weighted sum of said two or more reward functions, an overall! reward (R{M(t)/ - dated with a candidate 
value fm/) of said set of contro! signals (Mftlj. 

14. The aelf-deveiopfng device ot any previous claim, and comprising a prediction i duie (12) fo - « 1 

ilj the sensory-motor mcti ?l (SM{t), moty; 

wherei" foreact c slit -r i J to values of said contt ( f<t< ediction module 

Ji - i-(k t I- h/i' 1 i " "iiic - ih s *i -•() v inK i ' o tivati ,SM<t}. mot) a 

a series of feUae instants. and 

the motivation module (11) is acapted to calculate, for each of the p luralityofc? urn) a reward 

vaiue (R m fi)) w h tit oj ,i ser es of expected rewaf J c i , d values of the 
motivational ^anafc ; . at raid $or>es of future irstante. 

15. A sensory-motor apparatus being art autonomous software or hardware agent (2) and comprising. 

the self-developing dev=ce {1} of any prevsous claim, and 

a set of one or mora sensors (S, IR) adapted to sense the properties of the environment comprising the en- 
vironment internal and external to the agent (2); and 

means (A) for acting on the environment in accordance with the control signals (M(t)) output by the control 
moans (10) of the self-developing device {1 ); 

wherein the set e ot met ! SM{t)> m ! a < id to the 

output from said set of sensors (S, Iff). 
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FIG.1B 
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